Supplementary Material for ECCV 2012 Paper: Extracting 3D Scene-consistent Object Proposals and Depth from Stereo Images
نویسندگان
چکیده
This document accompanies the paper: “Extracting 3D Sceneconsistent Object Proposals and Depth from Stereo Images”. We provide implementation details and results of a competing object segmentation method [1]. We also show our results on the Middlebury Stereo Benchmark [2]. Note that information given in this document is not necessary to understand the content of the main paper. The Match Measure In the definition of the photo consistency term of sec. 2 (Model), we have described the match measure as being a weighted sum of gradient and color differences. Let us now formulate this match measure. Note that our photo consistency term (which includes the match measure) is identical to that described in the PatchMatch Stereo paper [3] where the following information can be found as well. The function φ(q, q′) of eq. (3) computes the pixel dissimilarity between a pixel q of the left and a pixel q′ of the right image as φ(q, q′) = (1− α) ·min(||Iq − Iq′ ||, τcol) + α · (||∆Iq −∆Iq′ ||, τgrad). (1) Here, ||Iq − Iq′ || denotes the L1-distance of colors of q and q′ in RGB space and ||∆Iq−∆Iq′ || represents the absolute difference of gray-value gradients. By using the gradient we can handle small radiometric differences that occur in left and right images (e.g., one image is slightly darker than the other). We truncate the pixel dissimilarities using parameters τcol and τgrad. This truncation limits the influence of occluded pixels in the cost aggregation procedure.The parameters are set as described in [3], i.e., {α, τcol, τgrad} := {0.9, 10, 2}. Depth Segmentation Algorithm In the Object Image Computation step of sec. 3 (Optimization), we have described a depth segmentation algorithm. We now explain this method in more detail. As described in the paper, we start by applying a meanshift color segmentation algorithm [4] on the left input image. We then fit a disparity plane to each color segment using our initial disparity map F ′. We now extract groups of segments that can be well modeled using the same disparity plane via applying an energy minimization approach that is explained as follows. We first record all planes that have been computed in the plane fitting process. Our goal is to assign each color segment to one of these planes such that an ? This work was supported in part by the Vienna Science and Technology Fund (WWTF) under project ICT08-019. 2 Michael Bleyer, Christoph Rhemann and Carsten Rother Fig. 1. Object segmentation result of “Blocks World” [1] on one of our images. energy is minimized. This energy consists of a data term and a smoothness term. For each pixel of the left image, the data term measures the absolute difference between the point’s disparity according to its assigned plane and its disparity in the initial disparity map F ′. The smoothness term puts a constant penalty on spatial neighboring pixels assigned to different planes (Potts model). Note that there is a parameter λ that balances data and smoothness terms. To approximate the energy minimum, we apply alpha-expansions of all planes present in the original solution, i.e., the one obtained after plane fitting. Note that the computational complexity of this step is relatively low, as the energy can be optimized on a segment level. (Nodes in the graph correspond to whole segments.) After running three iterations of the alpha-expansion algorithm we obtain our depth segments by grouping all color segments that are assigned to the same disparity plane in the optimized solution. λ represents the parameter that we vary to obtain depth segmentations of different granularities such as shown in fig. 4 of the paper. Results of “Blocks World” [1] As stated in the paper, “Blocks World” [1] can be regarded as a competing object segmentation method, as this algorithm gives a mapping of image pixels to one of seven different classes. We experienced that the publicly available code of [1] gives only very coarse segmentations on our test images, which are clearly inferior to our result. An example result for the “Parade” test set is shown in fig. 1. Middlebury Results We show the results on all four Middlebury images in fig. 2. Fig. 3 shows our ranking in the Middlebury Online Table [2]. Our method takes rank 13 out of 117 algorithms. It also performs better than our reimplementation of [3] (see fig. 3). Generality We train object stereo [5], our reimplementation of PatchMatch stereo [3] as well as our algorithm on the Middlebury evaluation set shown in fig. 2. We then apply the parameters that gave the highest Middlebury ranking for computation of the 2005 test set. Quantitative results are shown in tab. 1. Our algorithm achieves the lowest error percentage on 3 of 6 images (bold numbers in tab. 1). Fig. 4 shows the corresponding disparity and error maps. Individual terms of the energy function Tab. 2 shows the contribution off individual terms to the quality of disparity maps. Here, we test our method with the same parameters as in the previous experiment (= “All Terms On” in tab. 2). “Gravity Off” means that we set λgravity := 0, while the other parameters are set to the values of “All Terms On”. This disables the gravity constraint. 3D Scene-consistent Object Proposals and Depth from Stereo Images 3 Art Books Dolls Laundry Moebius Reindeer Avg. Error Object Stereo 6,71 13,14 11,37 15,50 11,54 7,17 10,90 PM Stereo 9,11 9,31 5,16 12,53 9,51 4,79 8,40 Ours 8,36 7,68 5,17 11,76 9,25 5,16 7,90 Tab. 1: Percentage of pixels having a disparity error > 1 pixel in unoccluded regions (black pixels on next slide). We have trained all methods on the 4 Middlebury evaluation pairs and then applied the parameters that gave the highest Middlebury ranking for computation of the 2005 test set. Test for Generalizati n Table 1. Generality of our approach. We lot the percentage of pixels havi g a disparity error > 1 pixel in unoccluded regions (black pixels in fig. 4). Our method performs better than object stereo [5] and PatchMatch stereo [3] on 3 of 6 images and achieves the lowest average error percentage. Art Books Dolls Laundry Moebius Reindeer Avg. Error All Terms On 8,36 7,68 5,17 11,76 9,25 5,16 7,90 Gravity Off 8,30 8,20 5,20 11,84 9,51 5,70 8,30 Intersecti on Off 7,63 8,08 4,86 12,18 9,35 5,34 7,91 Tightness Off 8,00 8,10 5,26 12,15 9,56 6,19 8,21 Tab. 1: Error percentage in unoccluded regions (black pixels on previous slide). Red numbers indicate a lower error percentage in comparison to “All Terms On”. We now test our method with the same parameters as in the previous experiment (= “All Terms On”). “Gravity Off” means that we set λgravity := 0, while the other parameters are set to the values of “All Terms On”. This disables the gravity constraint. “Intersection Off” means that we set λintersect := 0 to disable the intersection constraint. All other parameters are set to the values of “All Terms On”. We finally disable the bounding box tightness constraint by setting λtight := 0 (= “Tightness Off”). Switching of individual terms leads to higher error percentages, in general. Contribution of Individual Terms Table 2. Influence of physics-based terms of our energy. We lot the error percentage in unoccluded regions (black pixels in fig. 5). Red numbers indicate a lower error percentage in comparison to “All Terms On”. “Intersection Off” means that we set λintersect := 0 to disable the intersection constraint. All other par meters are s t to th values of “All Terms On”. We finally disable the bounding box tight ess constraint by setting λtight := 0 (= “Tightness Off”). Switching off individual terms leads to higher error percentages, in general. Corresponding disparity and error maps are shown in fig. 5.
منابع مشابه
Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images
This work combines two active areas of research in computer vision: unsupervised object extraction from a single image, and depth estimation from a stereo image pair. A recent, successful trend in unsupervised object extraction is to exploit so-called “3D scene-consistency”, that is enforcing that objects obey underlying physical constraints of the 3D scene, such as occupancy of 3D space and gr...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کامل3D Reconstruction by Stereo Imaging
Stereo vision is the process of extracting 3D depth information from multiple 2D images. Conventionally, two horizontally separated cameras are used to obtain two different perspectives on a scene. Because the cameras are separated, each feature in the scene appears at a different coordinate in both images. This difference between these coordinates is called the disparity and the depth of each ...
متن کاملA Tensor Voting Approach for Multi-view 3D Scene Flow Estimation and Refinement
We introduce a framework to estimate and refine 3D scene flow which connects 3D structures of a scene across different frames. In contrast to previous approaches which compute 3D scene flow that connects depth maps from a stereo image sequence or from a depth camera, our approach takes advantage of full 3D reconstruction which computes the 3D scene flow that connects 3D point clouds from multi-...
متن کاملMultiple View Object Cosegmentation Using Appearance and Stereo Cues
We present an automatic approach to segment an object in calibrated images acquired from multiple viewpoints. Our system starts with a new piecewise planar layer-based stereo algorithm that estimates a dense depth map that consists of a set of 3D planar surfaces. The algorithm is formulated using an energy minimization framework that combines stereo and appearance cues, where for each surface, ...
متن کاملLight Field Assisted Stereo Matching using Depth from Focus and Image-Guided Cost-Volume Filtering
Light field photography advances upon current digital imaging technology by making it possible to adjust focus after capturing a photograph. This capability is enabled by an array of microlenses mounted above the image sensor, allowing the camera to simultaneously capture both light intensity and approximate angle of incidence. The ability to adjust focus after capture makes light field photogr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012